性能优化:基础GPU-Driven、Instance Culling、光照优化等#138
Merged
Conversation
…owMaskPass 6.7ms->5ms)
…ryPass 50%的性能提升(2.3ms->1.2ms)
…adowMaskPass 90%的性能提升
Member
Author
name: RTAO Register Optimization
overview: 通过代码重排、简化 RayQuery 控制流、消除运行时分支等手段,降低 RTAO shader 的寄存器峰值占用,提升 occupancy,缓解 MIOT stall。
todos:
- id: simplify-rayquery
content: "优化 1: 简化 CastVisibilityRay — 添加 RAY_FLAG_FORCE_OPAQUE,移除 while 循环和候选处理分支"
status: completed
- id: reorder-ray-weight
content: "优化 2: 重排循环体 — 将 ray_weight 计算移到 CastVisibilityRay 之前,让 rand_vec 提前死亡"
status: completed
- id: eliminate-branch
content: "优化 3: 消除 sample_mode 运行时分支 — 使用编译期方案替代"
status: completed
- id: blue-noise
content: "优化 4: Blue Noise 替代 Hash RNG — 添加纹理参数,改善 BVH cache 命中率"
status: completed
…ositePass来融合AO和SceneColor,AO本身不输出SceneColor;优化AoPass数据流;去除SR相关代码
…ulling,因为目前启用会导致性能更差···
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



接下来TODO:
阴影
代码审查结果【AI】
正确性 Bug(建议提交前修复)
Shader 实例剔除逻辑有误 — [FrustumCull.comp.hlsl:103-125](vscode-file://vscode-app/c:/Program Files/Microsoft VS Code/cfbea10c5f/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
当一个 primitive 有多个 instance 且只有部分可见时,shader 只是把
instance_cnt设为visible_count,但没有重排 instance buffer。DrawIndexedIndirect会从first_instance开始画前visible_count个 instance,而不是那些真正可见的 instance。例如:5个instance中第0、2、4个可见 →
visible_count=3→ GPU画第0、1、2个(instance 1、2不该被画)为什么没有性能提升
三个原因:
RestoreDrawCommands()vkCmdDrawIndexedIndirect而非vkCmdDrawIndexedIndirectCountinstance_cnt=0的 draw 并不是真正免费的推荐优化方向
CopyFrom(BufferView, BufferView)做 GPU 端拷贝,避免每帧走 staging buffervkCmdDrawIndexedIndirectCount— 引擎已经实现了这个 API(DrawIndexedIndirectCnt),culling shader 输出 visible draw count 到单独的 count buffer,这样 GPU 可以完全跳过被剔除的 draw需要我实现这些优化吗?